The NTUT Blizzard Challenge 2009 Entry
نویسندگان
چکیده
This paper describes the process of building HMM-based speech synthesis system (HTS) voices for our participation in the Blizzard Challenge 2009. Out of the two languages required (English and Mandarin Chinese) we only built three Mandarin Chinese voices for main hub (MH) and two spoke (MS1 and MS2) tasks. According to the evaluation results, our MH voice got 3 points for both mean opinion scores (MOS) and similarity tests. Beside, 12.2% and 17% pinyin error rates (without (PER) and with tone (PTER), respectively) and 23% character error rate (CER) were achieved for intelligibility test. Moreover, our MS2 voice achieved 4 and 3 points for MOS and similarity test, respectively. In conclusion, we now have reasonable text-to-speech (TTS) baselines (at least for Mandarin Chinese) for developing our own advanced prosody model in the future.
منابع مشابه
The NTUT Blizzard Challenge 2013 Entry
This paper describes our HMM-based speech synthesis system (HTS) [1] submitted to Blizzard Challenge 2013 [2]. The focus of this entry is to build a TTS without using any provided information and speedup the training procedures by parallel processing. In this system, the input text is tagged by Stanford parser [3] and transformed into phone sequences by Flite’s letter to sound module [4]. Then ...
متن کاملThe NTUT Blizzard Challenge 2010 Entry
This paper describes our HMM-based speech synthesis system (HTS) submitted to Blizzard Challenge 2010. Three Mandarin Chinese voices were built for two hub (MH1and MH2) and one spoke (MS1) tasks this year (the voice for MS2 is the same as MH1’s one). According to the evaluation results, our system got in average 2 points for both mean opinion scores (MOS) and similarity tests for MH1, MH2 and M...
متن کاملThe NTUT Blizzard Challenge 2012 Entry
This paper describes our HMM-based speech synthesis system (HTS) [1] submitted to Blizzard Challenge 2012 [2]. This is our first English TTS and also our first audiobook application. In this system, not only linguistic but also semantic features beyond sentence level are extracted including the (1) semantic topics and (2) punctuation marks (PMs) of current and surrounding sentences and (3) numb...
متن کاملThe VUB Blizzard Challenge 2009 Entry
In this paper we describe the voices we submitted to the 2009 Blizzard Challenge, a yearly challenge to evaluate auditory speech synthesis on common data. Since it is the second time we participate in this challenge, in this paper we focus on the changes we made to our unit selection-based system. The weighted sum of symbolic target costs has been replaced by a single statistical target cost; t...
متن کاملThe AHOLAB Blizzard Challenge 2009 Entry
This paper describes the process of building unit selection voices for our participation in the Blizzard Challenge 2009. Out of the three voices required (EH1: 15 hours UK English, EH2: 1 hour UK English subset and MH: 6000-utterance Mandarin Chinese) only the English ones were built. As far as the Hub Tasks is concerned, only the ES1 task was completed using voice conversion techniques. The Ev...
متن کامل